{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "7f9d3587-a07f-4657-a57d-8d28619081d0",
      "metadata": {
        "id": "XIyP_0r6zuVc"
      },
      "source": [
        "<!-- Banner Image -->\n",
        "<img src=\"https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brev-xmas-3.png\" width=\"100%\">\n",
        "\n",
        "<!-- Links -->\n",
        "<center>\n",
        "  <a href=\"https://brev.nvidia.com\" style=\"color: #06b6d4;\">Console</a> •\n",
        "  <a href=\"https://developer.nvidia.com/brev\" style=\"color: #06b6d4;\">Docs</a> •\n",
        "  <a href=\"/\" style=\"color: #06b6d4;\">Templates</a> •\n",
        "  <a href=\"https://discord.gg/NVDyv7TUgJ\" style=\"color: #06b6d4;\">Discord</a>\n",
        "</center>\n",
        "\n",
        "# Running Meta's Llama 2 🤙\n",
        "\n",
        "Welcome!\n",
        "\n",
        "In this notebook and tutorial, we will download & run Meta's Llama 2 models (7B, 13B, 70B, 7B-chat, 13B-chat, and/or 70B-chat). If you're looking for a fine-tuning guide, follow **[this guide instead](https://github.com/brevdev/notebooks/blob/main/llama2-finetune.ipynb)**. \n",
        "\n",
        "### Help us make this tutorial better! Please provide feedback on the [Discord channel](https://discord.gg/y9428NwTh3) or on [X](https://x.com/harperscarroll)."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "4c180393-2631-4e44-97bd-47b8cced2060",
      "metadata": {
        "id": "SSBw-KpkyRga"
      },
      "source": [
        "#### Before we begin: A note on OOM errors\n",
        "\n",
        "If you get an error like this: `OutOfMemoryError: CUDA out of memory`, tweak your parameters to make the model less computationally intensive. I will help guide you through that in this guide, and if you have any additional questions you can reach out on the [Discord channel](https://discord.gg/y9428NwTh3) or on [X](https://x.com/harperscarroll).\n",
        "\n",
        "To re-try after you tweak your parameters, open a Terminal ('Launcher' or '+' in the nav bar above -> Other -> Terminal) and run the command `nvidia-smi`. Then find the process ID `PID` under `Processes` and run the command `kill [PID]`. You will need to re-start your notebook from the beginning. (There may be a better way to do this... if so please do let me know!)\n",
        "\n",
        "### Let's begin!"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ae808e1c-21bc-4b00-ae44-bbfd7d1ed82e",
      "metadata": {
        "id": "hWI-uRLEyRgb"
      },
      "source": [
        "## 1. Get a GPU\n",
        "\n",
        "I used a GPU and dev environment from [brev.dev](https://developer.nvidia.com/brev). Click the badge below to get your preconfigured instance:\n",
        "\n",
        "[![](https://brev-assets.s3.us-west-1.amazonaws.com/nv-lb-dark.svg)](https://brev.nvidia.com/environment/new?instance=T4:g4dn.12xlarge&diskStorage=512&name=llama2&file=https://github.com/brevdev/notebooks/raw/main/llama2.ipynb&python=3.10&cuda=12.0.1)\n",
        "\n",
        "Once you've checked out your machine and landed in your instance page, select the specs you'd like (I used Python 3.10 and CUDA 12.0.1; these should be preconfigured for you if you use the badge above) and click the \"Build\" button to build your verb container. Give this a few minutes.\n",
        "\n",
        "A few minutes after your model has started Running, click the 'Notebook' button on the top right of your screen once it illuminates (you may need to refresh the screen). You will be taken to a Jupyter Lab environment, where you can upload this Notebook.\n",
        "\n",
        "Note: You can connect your cloud credits (AWS or GCP) by clicking \"Org: \" on the top right, and in the panel that slides over, click \"Connect AWS\" or \"Connect GCP\" under \"Connect your cloud\" and follow the instructions linked to attach your credentials."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e51b1aa8-1602-4266-9af7-ff05ea060bc3",
      "metadata": {},
      "source": [
        "## 2. Sign up for Meta Llama 2 Access\n",
        "\n",
        "Sign up for access to the model [here](https://ai.meta.com/llama/). You will (quickly) receive an email with a URL you'll need to use in the next step. \n",
        "\n",
        "## 3. Set Up Llama 2\n",
        "Run the cell below to pull the Llama Github repo and install the requirements."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "f7347593-a776-4fd6-88bb-737e590fc2ac",
      "metadata": {},
      "outputs": [],
      "source": [
        "!git clone https://github.com/facebookresearch/llama.git\n",
        "import os\n",
        "os.chdir('/home/ubuntu/llama')\n",
        "!pip install -r requirements.txt"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "c3cbd9a4-728a-46ae-887b-4fdfdb9b30e2",
      "metadata": {},
      "source": [
        "Now open a Terminal ('Launcher' or '+' in the nav bar above -> Other -> Terminal) and enter the command:\n",
        "\n",
        "`cd llama && bash download.sh`\n",
        "\n",
        "This will take a while, especially if you download >1 model or a larger model."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9e75ffa4-1d51-411b-b82a-3d7d54ccc98e",
      "metadata": {},
      "source": [
        "## 4. Run the Model! \n",
        "\n",
        " Once this is done, you can run the cell below for inference. You can replace:\n",
        "- `ckpt_dir` with the model you'd like to run (the directory name for 7B, 13B, 70B, 7B-chat, 13B-chat, or 70B-chat)\n",
        "- `example_text_completion.py` with your own version, or change the prompts inside that file\n",
        "- `max_seq_length` to change the output sequence length\n",
        "- `max_batch_size` the max number of prompts to perform inference on per batch"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "60c00f15-9bab-4dea-9ce5-742f387b899e",
      "metadata": {},
      "outputs": [],
      "source": [
        "!torchrun --nproc_per_node 1 example_text_completion.py \\\n",
        "    --ckpt_dir llama-2-7b/ \\\n",
        "    --tokenizer_path tokenizer.model \\\n",
        "    --max_seq_len 128 --max_batch_size 4"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "30885538-b32e-4911-8a87-8023ecdaaf26",
      "metadata": {
        "id": "VCJnpZoayRgq"
      },
      "source": [
        "### Sweet... it worked! \n",
        "\n",
        "I hope you enjoyed this tutorial on running Llama 2. If you have any questions, feel free to reach out to me on [X](https://x.com/harperscarroll) or on the [Discord](https://discord.gg/T9bUNqMS8d).\n",
        "\n",
        "🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
